In this problem, we will study fluctuations in currency exchange rate over time.
File USD-JPY.csv download contains the daily exchange
rate of USD/JPY from January 2000 through May 31st 2022. We will
aggregate the data on a weekly basis, by taking the average rate within
each week. The time series of interest is the weekly currency exchange.
We will analyze this time series and its first order difference.
To read the data in R, save the file in your working
directory (make sure you have changed the directory if different from
the R working directory) and read the data using the R
function read.csv()
fpath <- "USD-JPY.csv"
df <- read.csv(fpath, head = TRUE)
Here we upload the libraries needed the this data analysis:
library(mgcv)
library(lubridate)
library(dplyr)
To prepare the data, run the following code snippet. First, aggregate by week:
df$date <- as.Date(df$Date, format='%Y-%m-%d')
df$week <- floor_date(df$date, "week")
df <- df[, c("week", "jpy")]
We now form the weekly aggrgated time series to use for data exploration! Please note that we will analyze the weekly aggregated data not the original (daily) data.
agg <- aggregate(x = df$jpy, by = list(df$week), FUN = mean)
colnames(agg) <- c("week", "jpy")
jpy.ts <- ts(agg$jpy, start = 2000, freq = 52)
jpy series to code and answer the
following questions.Before exploring the data, can you infer the data features from what you know about the USD-JPY currency exchange? Next plot the Time Series and ACF plots of the weekly data. Comment on the main features, and identify what (if any) assumptions of stationarity are violated.
Which type of model do you think will fit the data better: the trend or seasonality fitting model? Provide details for your response.
Response: General Insights on the USD-JPY Currency Rate
The time series of a currency rate would generally follow a trend, depending on the trade policies and international relations of the two countries. For example, the currency rate of some countries in the last 100 years has varied significantly compared to others. Recall the currency rate debate around the USD vs Yuan (Chinese currency), for example. The Indian currency vs USD has weakened considerably in the past 100 years and this would clearly follow a downward trend. For the USD-JPY currency exchange, the Japanese economy vs the US economy has followed a downward trend in the long term, but has fluctuated in the recent short term.
ts.plot(jpy.ts, col = "blue", xlab = "", ylab = "USD/JPY", main = "USD/JPY Exchange Rate over Time")
grid()
acf(jpy.ts, lag.max = 52 * 4, xlab = "Lag", ylab = "ACF", main = "USD/JPY ACF Analysis")
Response: General Insights from the Graphical Analysis
From the time series plot, we see that the variance fluctuates significantly within the time window. The trend also has some varies greatly within the specified time period. We can say that the variability depends on time for this time series. From the ACF plot, we can see that the autocorrelation is significant but slowly decreasing for all lag periods.
From the two plots, we can clearly say that the trend is clearly present, but no seasonality is observed. Hence, a trend fitting model would likely be a better fit than a seasonality model for this time series.
Fit the following trend estimation models:
Moving Average
Parametric Quadratic Polynomial
Local Polynomial
Splines Smoothing
Overlay the fitted values on the original time series. Plot the residuals with respect to time for each model. Plot the ACF of the residuals for each model also. Comment on the four models fit and on the appropriateness of the stationarity assumption of the residuals.
# convert X axis to 0-1 scale
points <- 1:length(jpy.ts)
points <- (points - min(points)) / max(points)
# 1. Fit a moving average model
mav.model <- ksmooth(points, jpy.ts, kernel = "box")
mav.fit <- ts(mav.model$y, start = 2000, frequency = 52)
# 2. Fit a parametric quadratic polynomial model
x1 <- points
x2 <- points^2
para.model <- lm(jpy.ts ~ x1 + x2)
para.fit <- ts(fitted(para.model), start = 2000, frequency = 52)
# 3. Fit a local polynomial model
loc.model <- loess(jpy.ts ~ points)
loc.fit <- ts(fitted(loc.model), start = 2000, frequency = 52)
# 4. Fit a splines smoothing model
gam.model <- gam(jpy.ts ~ s(points))
gam.fit <- ts(fitted(gam.model), start = 2000, frequency = 52)
ts.plot(jpy.ts, xlab = "", ylab = "USD/JPY", main = "Trend Estimation Comparison")
grid()
lines(mav.fit, lwd = 2, col = "red")
lines(para.fit, lwd = 2,col = "orange")
lines(loc.fit, lwd = 2,col = "green")
lines(gam.fit, lwd = 2,col = "blue")
legend("bottomleft", legend = c("MAV", "PARA", "LOC", "GAM"),
col = c("red", "orange", "green", "blue"), lwd = 2)
Response: Comparison of the fitted trend models:
Of the four trend estimation methods tested, the splines smoothing model appears to capture the data’s trend most effectively; the other three models fail to adequately capture the noteable drop in the USD/JPY exchange rate in the middle of the time period.
# Residual and Residual ACF plots of the residuals from the fitted models
diff.mav <- jpy.ts - mav.fit
diff.para <- jpy.ts - para.fit
diff.loc <- jpy.ts - loc.fit
diff.gam <- jpy.ts - gam.fit
par(mfrow = c(2, 2))
ts.plot(diff.mav, xlab = "", ylab = "Residuals", main = "Moving Average")
ts.plot(diff.para, xlab = "", ylab = "Residuals", main = "Parametric Quadratic Polynomial")
ts.plot(diff.loc, xlab = "", ylab = "Residuals", main = "Local Polynomial")
ts.plot(diff.gam, xlab = "", ylab = "Residuals", main = "Splines Trend")
par(mfrow = c(2, 2))
acf(diff.mav, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Moving Average")
acf(diff.para, lag.max = 52 * 4, xlab = "", ylab = "Residuals",
main = "Parametric Quadratic Polynomial")
acf(diff.loc, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Local Polynomial")
acf(diff.gam, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Splines Trend")
Response: Appropriateness of the trend model for stationarity
The residuals from the trend models show clear non-stationarity, suggesting that trend removal alone using any of the three models is not sufficient for accounting for non stationary variations in the time series.
The ACFs of the residuals also support this observation of non-stationarity; each chart shows slowly-declining lags which are indicative of trend in the residuals.
Now plot the difference time series and its ACF plot. Apply the four trend models in Question 1b to the differenced time series. What can you conclude about the difference data in terms of stationarity? Which model would you recommend to apply (trend removal via fitting trend vs differencing) such that to obtain a stationary process?
Hint: When TS data are differenced, the resulting data set will have an NA in the first data element due to the differencing.
ts.plot(diff(jpy.ts), col = "black", xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced USD/JPY Exchange Rate by Time")
grid()
acf(diff(jpy.ts), lag.max = 52 * 4, xlab = "Lag", ylab = "ACF ", main = "USD/JPY ACF Analysis")
# 1. Fit a moving average model
mav.model <- ksmooth(points[-1], diff(jpy.ts), kernel = "box")
mav.fit <- ts(mav.model$y, start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Moving Average Analysis")
grid()
lines(mav.fit, lwd = 2, col = "red")
# 2. Fit a parametric quadratic polynomial model
x1 <- points[-1]
x2 <- points[-1] ^ 2
para.model <- lm(diff(jpy.ts) ~ x1 + x2)
para.fit <- ts(fitted(para.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Parametric Quadratic Polynomial Analysis")
grid()
lines(para.fit, lwd = 2,col = "orange")
# 3. Fit a local polynomial model
loc.model <- loess(diff(jpy.ts) ~ points[-1])
loc.fit <- ts(fitted(loc.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Local Polynomial Analysis")
grid()
lines(loc.fit, lwd = 2,col = "green")
# 4. Fit a splines smoothing model
gam.model <- gam(diff(jpy.ts) ~ s(points[-1]))
gam.fit <- ts(fitted(gam.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts),
xlab = "",
ylab = "Differenced USD/JPY",
main = "Differenced Splines Smoothing Analysis")
grid()
lines(gam.fit, lwd = 2, col = "blue")
# 5. Compare all estimated trends
vals <- c(mav.fit, para.fit, loc.fit, gam.fit)
ylim <- c(min(vals), max(vals))
ts.plot(mav.fit, lwd = 2, col = "black", ylim = ylim,
xlab = "", ylab = "USD/JPY",
main = "Differenced Regression Model Comparison")
grid()
lines(mav.fit, lwd = 2, col = "red")
lines(para.fit, lwd = 2, col = "orange")
lines(loc.fit, lwd = 2, col = "green")
lines(gam.fit, lwd = 2, col = "blue")
legend("bottomright", legend = c("MAV", "PARA", "LOC", "GAM"),
col = c("red", "orange", "green", "blue"), lwd = 2)
Response: Comments about the stationarity of the differenced data:
The time series plots seem to clearly show the appropriateness of fit of the models and the indication of stationarity in the differenced data.
The fitted line showing the moving average trend seems to have the least variability. The parametric quadratic model also has little variability, but not as much as the splines model which has higher deviation in trend, and local polynomial model which has the highest deviation as shown in the combined graph. The moving average trend model, however, has many ‘kinks’ that capture the minor movements that might not be of use in determining the trend.
From this analysis, we can confirm the property of stationarity; hence using the differenced data is a more effective approach for removing the trend such that the time series becomes stationary.
In this problem, we will analyze aggregated temperature data.
Data Everest Temp Jan-Mar 2021.csv contains the hourly average temperature at the Mount Everest Base Camp for the months of January to March 2021. Run the following code to prepare the data for analysis:
To read the data in R, save the file in your working
directory (make sure you have changed the directory if different from
the R working directory) and read the data using the R
function read.csv()
You will perform the analysis and modelling on the Temp
data column.
fpath <- "Everest Temp Jan-Mar 2021.csv"
df <- read.csv(fpath, head = TRUE)
Here are the libraries you will need:
library(mgcv)
library(TSA)
library(dynlm)
library(ggplot2)
Run the following code to prepare the data for analysis:
df$timestamp<-ymd_hms(df$timestamp)
temp <- ts(df$temp, freq = 24)
datetime<-ts(df$timestamp)
Plot both the Time Series and ACF plots. Comment on the main features, and identify what (if any) assumptions of stationarity are violated. Additionally, comment if you believe the differenced data is more appropriate for use in fitting the data. Support your response with a graphical analysis.
Hint: Make sure to use the appropriate differenced data.
everest<-ts(df$temp,frequency = 24)
plot(everest,xlab="Time",ylab="Temperature",main="Everest Hourly Temperature")
acf(everest,lag.max=24*6,main="Everest Hourly Temperature ACF")
Response: Comments about the time series and ACF plots of the
original time series
The time series plot shows that the data exhibits a fluctuating trend and clear hourly seasonality. The ACF plot exhibits lags which are both slowly decreasing and exhibit a cyclical rising and falling pattern, which confirm the presence of trend and seasonality in the data, respectively.
plot(diff(everest),xlab="Time",ylab="Temperature",main="Everest Hourly Temperature: 1-Differenced")
acf(diff(everest),lag.max=24*6,main="Everest Hourly Temperature ACF: 1-Differenced")
plot(diff(everest,24),xlab="Time",ylab="Temperature",main="Everest Hourly Temperature: 24-Differenced")
acf(diff(everest,24),lag.max=24*6,main="Everest Hourly Temperature ACF: 24-Differenced")
Response: Comments about the time series and ACF plots of the differenced time series
The plot of the 1st-order differenced data shows that trend has been removed. The seasonality effect, however, still seems to be present. For the 1st-order differenced data, the first seasonal lag in the ACF large and decays slowly over multiples of the lag. Clearly, the 1st order differenced data is not appropriate for use in fitting the seasonality of the data.
Since we know that the 1st order difference doesn’t appropriately address seasonality, we can apply a 24 lag difference as provided above. The absence of a cyclical pattern in the ACF plot indicates that seasonality has been removed to a great extent; however, there is still evidence of a trend in the data, given the presence of slowly-decaying lags.
Separately fit a seasonality harmonic model and the ANOVA seasonality model to the temperature data. Evaluate the quality of each fit with residual analysis. Does one model perform better than the other? Which model would you select to fit the seasonality in the data?
times<-ts(df$timestamp)
Timereq<-times
Timereq2<-times^2
## Estimate seasonality using ANOVA approach
td_lm<- dynlm(everest ~ season(everest))
summary(td_lm)
##
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
##
## Call:
## dynlm(formula = everest ~ season(everest))
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.3218 -2.1758 -0.0539 2.0218 11.2616
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.637733 0.419589 -20.586 < 2e-16 ***
## season(everest)2 -0.089922 0.593388 -0.152 0.879564
## season(everest)3 0.007367 0.593388 0.012 0.990096
## season(everest)4 1.270233 0.593388 2.141 0.032416 *
## season(everest)5 3.556589 0.593388 5.994 2.40e-09 ***
## season(everest)6 4.840344 0.593388 8.157 5.79e-16 ***
## season(everest)7 5.439567 0.593388 9.167 < 2e-16 ***
## season(everest)8 5.754467 0.593388 9.698 < 2e-16 ***
## season(everest)9 5.580767 0.593388 9.405 < 2e-16 ***
## season(everest)10 5.180078 0.593388 8.730 < 2e-16 ***
## season(everest)11 4.415444 0.593388 7.441 1.44e-13 ***
## season(everest)12 3.262233 0.593388 5.498 4.31e-08 ***
## season(everest)13 2.256178 0.593388 3.802 0.000147 ***
## season(everest)14 1.569878 0.593388 2.646 0.008214 **
## season(everest)15 1.310044 0.593388 2.208 0.027369 *
## season(everest)16 1.070478 0.593388 1.804 0.071371 .
## season(everest)17 0.818689 0.593388 1.380 0.167828
## season(everest)18 0.701811 0.593388 1.183 0.237053
## season(everest)19 0.522100 0.593388 0.880 0.379033
## season(everest)20 0.408722 0.593388 0.689 0.491028
## season(everest)21 0.266567 0.593388 0.449 0.653313
## season(everest)22 0.158144 0.593388 0.267 0.789872
## season(everest)23 0.031811 0.593388 0.054 0.957251
## season(everest)24 0.083378 0.593388 0.141 0.888269
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.981 on 2136 degrees of freedom
## Multiple R-squared: 0.215, Adjusted R-squared: 0.2065
## F-statistic: 25.43 on 23 and 2136 DF, p-value: < 2.2e-16
plot(everest, type = "l")
lines(fitted(td_lm), col = "blue")
## Estimate seasonality using harmonic model
td_lm2 <- dynlm(everest ~ harmonic(everest))
summary(td_lm2)
##
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
##
## Call:
## dynlm(formula = everest ~ harmonic(everest))
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.2278 -2.3106 -0.0262 2.2835 11.9682
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.62044 0.08746 -75.69 <2e-16 ***
## harmonic(everest)cos(2*pi*t) -1.40540 0.12369 -11.36 <2e-16 ***
## harmonic(everest)sin(2*pi*t) 2.22315 0.12369 17.97 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.065 on 2157 degrees of freedom
## Multiple R-squared: 0.1733, Adjusted R-squared: 0.1725
## F-statistic: 226.1 on 2 and 2157 DF, p-value: < 2.2e-16
plot(everest, type = "l")
lines(fitted(td_lm2), col = "purple")
## Residual Process: ANOVA seasonality model
resid.1 = residuals(td_lm)
## Residual Process: Harmonic seasonality model
resid.2 = residuals(td_lm2)
y.min = min(c(resid.1,resid.2))
y.max = max(c(resid.1,resid.2))
ts.plot(resid.1,lwd=2,ylab="Residual Process",col="blue", ylim=c(y.min,y.max))
lines(resid.2,col="purple")
legend(x=75,y=y.max,legend=c("ANOVA seasonality model","Harmonic seasonality model"),lty = 1, col=c("blue","purple"))
acf(resid.1,lag.max=24*6,main="ANOVA seasonality model")
acf(resid.2,lag.max=24*6,main="Harmonic seasonality model")
Response: Compare Seasonality Models
The regression coefficients for both models are statistically significant, indicating that both models capture a sesonal pattern. The two models perform similarly, except that the ANOVA model overestimates and the harmonics models underestimates seasonality based on the comparison of the fitted values.
The residuals time series plots for both models show a fluctuating trend; this suggests that we will need to jointly fit both trend and seasonality. The ACF plots show also that the residuals are not stationary, with ACF values slowly decreasing, again suggesting the presence of a trend. The ANOVA model seems to capture seasonality better since the ACF values are not maintaining a seasonality pattern as for the harmonics model.
Using the time series data, fit the following models to estimate the trend with seasonality fitted using ANOVA:
Parametric Polynomial Regression
Non-parametric model
Overlay the fitted values on the original time series. Plot the residuals with respect to time. Plot the ACF of the residuals. Comment on how the two models fit and on the appropriateness of the stationarity assumption of the residuals.
What form of modelling seems most appropriate and what implications might this have for how one might expect long term temperature data to behave? Provide explicit conclusions based on the data analysis.
time.pts = c(1:length(everest))
time.pts = c(time.pts - min(time.pts))/max(time.pts)
x1 = time.pts
x2 = time.pts^2
#Parametric Polynomial Regression
lm.fit2 = dynlm(everest~x1+x2+season(everest))
summary(lm.fit2)
##
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
##
## Call:
## dynlm(formula = everest ~ x1 + x2 + season(everest))
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.0762 -2.0997 -0.0568 1.8439 10.9177
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.10809 0.42006 -7.399 1.96e-13 ***
## x1 -25.61966 1.02979 -24.879 < 2e-16 ***
## x2 21.77516 0.99751 21.829 < 2e-16 ***
## season(everest)2 -0.08804 0.51511 -0.171 0.86432
## season(everest)3 0.01113 0.51511 0.022 0.98276
## season(everest)4 1.27587 0.51511 2.477 0.01333 *
## season(everest)5 3.56408 0.51511 6.919 5.99e-12 ***
## season(everest)6 4.84969 0.51511 9.415 < 2e-16 ***
## season(everest)7 5.45075 0.51511 10.582 < 2e-16 ***
## season(everest)8 5.76748 0.51511 11.197 < 2e-16 ***
## season(everest)9 5.59560 0.51511 10.863 < 2e-16 ***
## season(everest)10 5.19673 0.51511 10.088 < 2e-16 ***
## season(everest)11 4.43390 0.51512 8.608 < 2e-16 ***
## season(everest)12 3.28248 0.51512 6.372 2.27e-10 ***
## season(everest)13 2.27821 0.51512 4.423 1.02e-05 ***
## season(everest)14 1.59368 0.51512 3.094 0.00200 **
## season(everest)15 1.33562 0.51512 2.593 0.00958 **
## season(everest)16 1.09781 0.51512 2.131 0.03319 *
## season(everest)17 0.84776 0.51512 1.646 0.09996 .
## season(everest)18 0.73262 0.51512 1.422 0.15510
## season(everest)19 0.55464 0.51512 1.077 0.28172
## season(everest)20 0.44298 0.51512 0.860 0.38991
## season(everest)21 0.30254 0.51512 0.587 0.55705
## season(everest)22 0.19582 0.51512 0.380 0.70388
## season(everest)23 0.07117 0.51512 0.138 0.89012
## season(everest)24 0.12442 0.51512 0.242 0.80916
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.455 on 2134 degrees of freedom
## Multiple R-squared: 0.409, Adjusted R-squared: 0.402
## F-statistic: 59.07 on 25 and 2134 DF, p-value: < 2.2e-16
temp.fit.lm.seas=fitted(lm.fit2)
ggplot(df, aes(timestamp, temp)) + geom_line() + xlab("Time") + ylab("Temperature Data")+
geom_line(aes(timestamp,temp.fit.lm.seas),lwd=1,col="blue")
#Non-parametric model
hr = as.factor(format(df$timestamp,"%H"))
gam.fit.seastr = gam(everest~s(time.pts)+hr)
summary(gam.fit.seastr)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## everest ~ s(time.pts) + hr
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.640172 0.270126 -31.986 < 2e-16 ***
## hr01 -0.089852 0.381998 -0.235 0.814064
## hr02 0.007526 0.381998 0.020 0.984282
## hr03 1.270502 0.381999 3.326 0.000896 ***
## hr04 3.556986 0.381999 9.312 < 2e-16 ***
## hr05 4.840889 0.382000 12.672 < 2e-16 ***
## hr06 5.440278 0.382001 14.242 < 2e-16 ***
## hr07 5.755364 0.382001 15.066 < 2e-16 ***
## hr08 5.581870 0.382002 14.612 < 2e-16 ***
## hr09 5.181406 0.382003 13.564 < 2e-16 ***
## hr10 4.417017 0.382005 11.563 < 2e-16 ***
## hr11 3.264069 0.382006 8.545 < 2e-16 ***
## hr12 2.258297 0.382008 5.912 3.94e-09 ***
## hr13 1.572299 0.382009 4.116 4.00e-05 ***
## hr14 1.312787 0.382011 3.437 0.000601 ***
## hr15 1.073562 0.382013 2.810 0.004995 **
## hr16 0.822133 0.382015 2.152 0.031502 *
## hr17 0.705635 0.382017 1.847 0.064867 .
## hr18 0.526323 0.382019 1.378 0.168429
## hr19 0.413364 0.382022 1.082 0.279356
## hr20 0.271646 0.382024 0.711 0.477119
## hr21 0.163680 0.382027 0.428 0.668365
## hr22 0.037823 0.382030 0.099 0.921142
## hr23 0.089886 0.382032 0.235 0.814012
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time.pts) 8.946 8.999 335.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.671 Deviance explained = 67.6%
## GCV = 6.6682 Scale est. = 6.5665 n = 2160
fit.gam.seastr = fitted(gam.fit.seastr)
ggplot(df, aes(timestamp, temp)) + geom_line() + xlab("Time") + ylab("Temperature Data")+
geom_line(aes(timestamp,fit.gam.seastr),col="purple")
resid.fit.lm = ts(resid(lm.fit2),frequency=24)
resid.fit.gam.seas = ts(resid(gam.fit.seastr),frequency=24)
y.min = min(c(resid.fit.lm,resid.fit.gam.seas))
y.max = max(c(resid.fit.lm,resid.fit.gam.seas))
ts.plot(resid.fit.lm,lwd=2,col="blue",ylim=c(y.min,y.max))
lines(resid.fit.gam.seas,col="purple")
13
## [1] 13
legend(x=75,y=y.max,legend=c("Parametric Polynomial Model","Non-Parametric Model"),lty = 1, col=c("blue","purple"))
acf(resid.fit.lm,lag.max=24*6,main="Parametric Polynomial Model")
acf(resid.fit.gam.seas,lag.max=24*6,main="Non-Parametric Model")
Response: Model Comparison
From the fitted models, we see that the parametric polynomial regression shows a linear trend fitting on the original data, while the seasonality is quite effectively captured. In case of the non-parametric model, while the seasonality is effectively captured, the trend is also fitted much better than with the polynomial model.
From residual analysis of the two models, we see that the residuals of the parametric polynomial regression models show somewhere larger variability. From the ACF of the residuals, we see that the residuals from the non-parametric model fit are stationary treats whereas those from the parametric model show some serial correlation.
We can clearly see here that the non-parametric model of the trend seems to work for the temperature data. We can predict that the temperature will follow a fluctuating trend, with seasonality on a daily basis; hence, this model can be used towards predicting temperature.
Overall, we can see that daily temperature rises and falls over time (as expected); hence, a seasonality model alone is not sufficient to capture the variability in the data.